[アップデート] CloudFormation が AWS Parallel Computing Service をサポートしました
こんいちは!AWS 事業本部コンサルティング部のたかくに(@takakuni_)です。
CloudFormation が AWS Parallel Computing Service をサポートしました。
2024 年夏頃に突如現れた AWS Parallel Computing Service を IaC で表現できるようになり、検証スピードが捗りますね。(私のブログスピードも早まりそうです。)
アップデート内容
今回 CloudForamtion で以下のリソースをサポートしました。
- AWS::PCS::Cluster
- AWS::PCS::ComputeNodeGroup
- AWS::PCS::Queue
早速、リソースを作っていきましょう。今回は Getting Started に従いリソースを作っていきます。
VPC
かなり省略しますが、 Getting Started では VPC とセキュリティグループは、 PCS は CloudFormation コードが用意されていました。
今回はこちらを流用させていただき、リソースを作ります。
main.yaml(VPC 部分)
こちらの手順に従いました。(スタック名は pcs-blog
にしました。)
AWSTemplateFormatVersion: '2010-09-09'
Description: Create public and private subnets in two or three AZs. Specified CIDR blocks allow 4096 IPs each.
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: VPC
Parameters:
- CidrBlock
- Label:
default: Subnets A
Parameters:
- CidrPublicSubnetA
- CidrPrivateSubnetA
- Label:
default: Subnets B
Parameters:
- CidrPublicSubnetB
- CidrPrivateSubnetB
- Label:
default: Subnets C
Parameters:
- ProvisionSubnetsC
- CidrPublicSubnetC
- CidrPrivateSubnetC
Parameters:
CidrBlock:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.0.0/16
Description: VPC CIDR Block (eg 10.3.0.0/16)
Type: String
CidrPublicSubnetA:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.0.0/20
Description: VPC CIDR Block for the Public Subnet A
Type: String
CidrPublicSubnetB:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.16.0/20
Description: VPC CIDR Block for the Public Subnet B
Type: String
CidrPublicSubnetC:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.32.0/20
Description: VPC CIDR Block for the Public Subnet C
Type: String
CidrPrivateSubnetA:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.128.0/20
Description: VPC CIDR Block for the Private Subnet A
Type: String
CidrPrivateSubnetB:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.144.0/20
Description: VPC CIDR Block for the Private Subnet B
Type: String
CidrPrivateSubnetC:
AllowedPattern: '((\d{1,3})\.){3}\d{1,3}/\d{1,2}'
Default: 10.3.160.0/20
Description: VPC CIDR Block for the Private Subnet C
Type: String
ProvisionSubnetsC:
Type: String
Description: Provision optional 3rd set of subnets
Default: 'True'
AllowedValues:
- 'True'
- 'False'
Mappings:
RegionMap:
us-east-1:
ZoneId1: use1-az6
ZoneId2: use1-az4
ZoneId3: use1-az5
us-east-2:
ZoneId1: use2-az2
ZoneId2: use2-az3
ZoneId3: use2-az1
us-west-1:
ZoneId1: usw1-az1
ZoneId2: usw1-az3
ZoneId3: usw1-az2
us-west-2:
ZoneId1: usw2-az1
ZoneId2: usw2-az2
ZoneId3: usw2-az3
eu-central-1:
ZoneId1: euc1-az3
ZoneId2: euc1-az2
ZoneId3: euc1-az1
eu-west-1:
ZoneId1: euw1-az1
ZoneId2: euw1-az2
ZoneId3: euw1-az3
eu-west-2:
ZoneId1: euw2-az2
ZoneId2: euw2-az3
ZoneId3: euw2-az1
eu-west-3:
ZoneId1: euw3-az1
ZoneId2: euw3-az2
ZoneId3: euw3-az3
eu-north-1:
ZoneId1: eun1-az2
ZoneId2: eun1-az1
ZoneId3: eun1-az3
ca-central-1:
ZoneId1: cac1-az2
ZoneId2: cac1-az1
ZoneId3: cac1-az3
eu-south-1:
ZoneId1: eus1-az2
ZoneId2: eus1-az1
ZoneId3: eus1-az3
ap-east-1:
ZoneId1: ape1-az3
ZoneId2: ape1-az2
ZoneId3: ape1-az1
ap-northeast-1:
ZoneId1: apne1-az4
ZoneId2: apne1-az1
ZoneId3: apne1-az2
ap-northeast-2:
ZoneId1: apne2-az1
ZoneId2: apne2-az3
ZoneId3: apne2-az2
ap-south-1:
ZoneId1: aps1-az2
ZoneId2: aps1-az3
ZoneId3: aps1-az1
ap-southeast-1:
ZoneId1: apse1-az1
ZoneId2: apse1-az2
ZoneId3: apse1-az3
ap-southeast-2:
ZoneId1: apse2-az3
ZoneId2: apse2-az1
ZoneId3: apse2-az2
us-gov-west-1:
ZoneId1: usgw1-az1
ZoneId2: usgw1-az2
ZoneId3: usgw1-az3
ap-northeast-3:
ZoneId1: apne3-az3
ZoneId2: apne3-az2
ZoneId3: apne3-az1
sa-east-1:
ZoneId1: sae1-az3
ZoneId2: sae1-az2
ZoneId3: sae1-az1
af-south-1:
ZoneId1: afs1-az3
ZoneId2: afs1-az2
ZoneId3: afs1-az1
ap-south-2:
ZoneId1: aps2-az3
ZoneId2: aps2-az2
ZoneId3: aps2-az1
ap-southeast-3:
ZoneId1: apse3-az3
ZoneId2: apse3-az2
ZoneId3: apse3-az1
ap-southeast-4:
ZoneId1: apse4-az3
ZoneId2: apse4-az2
ZoneId3: apse4-az1
ca-west-1:
ZoneId1: caw1-az3
ZoneId2: caw1-az2
ZoneId3: caw1-az1
eu-central-2:
ZoneId1: euc2-az3
ZoneId2: euc2-az2
ZoneId3: euc2-az1
eu-south-2:
ZoneId1: eus2-az3
ZoneId2: eus2-az2
ZoneId3: eus2-az1
il-central-1:
ZoneId1: ilc1-az3
ZoneId2: ilc1-az2
ZoneId3: ilc1-az1
me-central-1:
ZoneId1: mec1-az3
ZoneId2: mec1-az2
ZoneId3: mec1-az1
Conditions:
DoProvisionSubnetsC: !Equals [!Ref ProvisionSubnetsC, 'True']
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref CidrBlock
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: 'Name'
Value: !Sub '${AWS::StackName}:Large-Scale-HPC'
VPCFlowLog:
Type: AWS::EC2::FlowLog
Properties:
ResourceId: !Ref VPC
ResourceType: VPC
TrafficType: ALL
LogDestinationType: cloud-watch-logs
LogGroupName: !Sub '${AWS::StackName}-VPCFlowLogs'
DeliverLogsPermissionArn: !GetAtt FlowLogRole.Arn
FlowLogRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- vpc-flow-logs.amazonaws.com
Action:
- 'sts:AssumeRole'
ManagedPolicyArns:
- !Ref AWS::NoValue
Policies:
- PolicyName: FlowLogPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
- 'logs:DescribeLogGroups'
- 'logs:DescribeLogStreams'
Resource: !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:${AWS::StackName}-VPCFlowLogs:*'
PublicSubnetA:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Ref CidrPublicSubnetA
AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PublicSubnetA-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
PublicSubnetB:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Ref CidrPublicSubnetB
AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PublicSubnetB-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
PublicSubnetC:
Type: AWS::EC2::Subnet
Condition: DoProvisionSubnetsC
Properties:
VpcId: !Ref VPC
CidrBlock: !Ref CidrPublicSubnetC
AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PublicSubnetC-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
InternetGateway:
Type: AWS::EC2::InternetGateway
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}:PublicRoute'
PublicRoute1:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
PublicSubnetARouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetA
RouteTableId: !Ref PublicRouteTable
PublicSubnetBRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetB
RouteTableId: !Ref PublicRouteTable
PublicSubnetCRouteTableAssociation:
Condition: DoProvisionSubnetsC
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetC
RouteTableId: !Ref PublicRouteTable
PrivateSubnetA:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
CidrBlock: !Ref CidrPrivateSubnetA
MapPublicIpOnLaunch: false
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PrivateSubnetA-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone1.ZoneName
PrivateSubnetB:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
CidrBlock: !Ref CidrPrivateSubnetB
MapPublicIpOnLaunch: false
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PrivateSubnetB-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone2.ZoneName
PrivateSubnetC:
Type: AWS::EC2::Subnet
Condition: DoProvisionSubnetsC
Properties:
VpcId: !Ref VPC
AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
CidrBlock: !Ref CidrPrivateSubnetC
MapPublicIpOnLaunch: false
Tags:
- Key: Name
Value: !Sub
- '${StackName}:PrivateSubnetC-${AvailabilityZone}'
- StackName: !Ref AWS::StackName
AvailabilityZone: !GetAtt AvailabiltyZone3.ZoneName
NatGatewayAEIP:
Type: AWS::EC2::EIP
DependsOn: AttachGateway
Properties:
Domain: vpc
NatGatewayBEIP:
Type: AWS::EC2::EIP
DependsOn: AttachGateway
Properties:
Domain: vpc
NatGatewayCEIP:
Condition: DoProvisionSubnetsC
Type: AWS::EC2::EIP
DependsOn: AttachGateway
Properties:
Domain: vpc
NatGatewayA:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGatewayAEIP.AllocationId
SubnetId: !Ref PublicSubnetA
NatGatewayB:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGatewayBEIP.AllocationId
SubnetId: !Ref PublicSubnetB
NatGatewayC:
Type: AWS::EC2::NatGateway
Condition: DoProvisionSubnetsC
Properties:
AllocationId: !GetAtt NatGatewayCEIP.AllocationId
SubnetId: !Ref PublicSubnetC
PrivateRouteTableA:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}:PrivateRouteA'
PrivateRouteTableB:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}:PrivateRouteB'
PrivateRouteTableC:
Type: AWS::EC2::RouteTable
Condition: DoProvisionSubnetsC
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}:PrivateRouteC'
DefaultPrivateRouteA:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTableA
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGatewayA
DefaultPrivateRouteB:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTableB
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGatewayB
DefaultPrivateRouteC:
Type: AWS::EC2::Route
Condition: DoProvisionSubnetsC
Properties:
RouteTableId: !Ref PrivateRouteTableC
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGatewayC
PrivateSubnetARouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTableA
SubnetId: !Ref PrivateSubnetA
PrivateSubnetBRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTableB
SubnetId: !Ref PrivateSubnetB
PrivateSubnetCRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Condition: DoProvisionSubnetsC
Properties:
RouteTableId: !Ref PrivateRouteTableC
SubnetId: !Ref PrivateSubnetC
AvailabiltyZone1:
Type: Custom::AvailabiltyZone
DependsOn: LogGroupGetAZLambdaFunction
Properties:
ServiceToken: !GetAtt GetAZLambdaFunction.Arn
ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId1]
AvailabiltyZone2:
Type: Custom::AvailabiltyZone
DependsOn: LogGroupGetAZLambdaFunction
Properties:
ServiceToken: !GetAtt GetAZLambdaFunction.Arn
ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId2]
AvailabiltyZone3:
Type: Custom::AvailabiltyZone
Condition: DoProvisionSubnetsC
DependsOn: LogGroupGetAZLambdaFunction
Properties:
ServiceToken: !GetAtt GetAZLambdaFunction.Arn
ZoneId: !FindInMap [RegionMap, !Ref 'AWS::Region', ZoneId3]
LogGroupGetAZLambdaFunction:
Type: AWS::Logs::LogGroup
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
LogGroupName: !Sub /aws/lambda/${GetAZLambdaFunction}
RetentionInDays: 7
GetAZLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Description: GetAZLambdaFunction
Timeout: 60
Runtime: python3.9
Handler: index.handler
Role: !GetAtt GetAZLambdaRole.Arn
Code:
ZipFile: |
import cfnresponse
from json import dumps
from boto3 import client
EC2 = client('ec2')
def handler(event, context):
if event['RequestType'] in ('Create', 'Update'):
print(dumps(event, default=str))
data = {}
try:
response = EC2.describe_availability_zones(
Filters=[{'Name': 'zone-id', 'Values': [event['ResourceProperties']['ZoneId']]}]
)
print(dumps(response, default=str))
data['ZoneName'] = response['AvailabilityZones'][0]['ZoneName']
except Exception as error:
cfnresponse.send(event, context, cfnresponse.FAILED, {}, reason=error)
finally:
cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
else:
cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
Tags:
- Key: Name
Value: !Sub ${AWS::StackName}GetAZLambdaFunction
GetAZLambdaRole:
Type: AWS::IAM::Role
Properties:
Path: /
Description: GetAZLambdaFunction
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sts:AssumeRole
Principal:
Service:
- !Sub 'lambda.${AWS::URLSuffix}'
ManagedPolicyArns:
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
Policies:
- PolicyName: GetAZLambdaFunction
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: ec2
Effect: Allow
Action:
- ec2:DescribeAvailabilityZones
Resource:
- '*'
Tags:
- Key: Name
Value: !Sub ${AWS::StackName}-GetAZLambdaFunction
S3Endpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
VpcEndpointType: 'Gateway'
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
RouteTableIds:
- !Ref PublicRouteTable
- !Ref PrivateRouteTableA
- !Ref PrivateRouteTableB
VpcId: !Ref VPC
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow all traffic from resources in VPC
VpcId:
Ref: VPC
SecurityGroupIngress:
- IpProtocol: -1
CidrIp: !Ref CidrBlock
SecurityGroupEgress:
- IpProtocol: -1
CidrIp: !Ref CidrBlock
Outputs:
VPC:
Value: !Ref VPC
Description: ID of the VPC
Export:
Name: !Sub ${AWS::StackName}-VPC
PublicSubnets:
Value: !Join
- ','
- - !Ref PublicSubnetA
- !Ref PublicSubnetB
- !If
- DoProvisionSubnetsC
- !Ref PublicSubnetC
- !Ref AWS::NoValue
Description: ID of the public subnets
Export:
Name: !Sub ${AWS::StackName}-PublicSubnets
PrivateSubnets:
Value: !Join
- ','
- - !Ref PrivateSubnetA
- !Ref PrivateSubnetB
- !If
- DoProvisionSubnetsC
- !Ref PrivateSubnetC
- !Ref AWS::NoValue
Description: ID of the private subnets
Export:
Name: !Sub ${AWS::StackName}-PrivateSubnets
DefaultPrivateSubnet:
Description: The ID of a default private subnet
Value: !Ref PrivateSubnetA
Export:
Name: !Sub '${AWS::StackName}-DefaultPrivateSubnet'
DefaultPublicSubnet:
Description: The ID of a default public subnet
Value: !Ref PublicSubnetA
Export:
Name: !Sub '${AWS::StackName}-DefaultPublicSubnet'
InternetGatewayId:
Description: The ID of the Internet Gateway
Value: !Ref InternetGateway
Export:
Name: !Sub '${AWS::StackName}-InternetGateway'
SecurityGroup:
Description: The ID of the local security group
Value: !Ref SecurityGroup
Export:
Name: !Sub '${AWS::StackName}-SecurityGroup'
pcs-cluster-sg.yaml
こちらの手順に従いました。(スタック名は pcs-blog-sg
にしました。)
AWSTemplateFormatVersion: 2010-09-09
Description: Security group for communications between AWS PCS controller, compute nodes, and client nodes, plus optional inbound SSH security group.
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Network
Parameters:
- VpcId
- Label:
default: Security group configuration
Parameters:
- CreateInboundSshSecurityGroup
- ClientIpCidr
Parameters:
VpcId:
Description: VPC where the AWS PCS cluster will be deployed
Type: 'AWS::EC2::VPC::Id'
ClientIpCidr:
Description: IP address(s) allowed to connect to nodes using SSH
Default: '0.0.0.0/0'
Type: String
AllowedPattern: (\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})
ConstraintDescription: Value must be a valid IP or network range of the form x.x.x.x/x.
CreateInboundSshSecurityGroup:
Description: Create an inbound security group to allow SSH access to nodes.
Type: String
Default: 'True'
AllowedValues:
- 'True'
- 'False'
Conditions:
CreateSshSecGroup: !Equals [!Ref CreateInboundSshSecurityGroup, 'True']
Resources:
ClusterSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Supports communications between AWS PCS controller, compute nodes, and client nodes
VpcId: !Ref VpcId
GroupName: !Sub 'cluster-${AWS::StackName}'
ClusterAllowAllInboundFromSelf:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref ClusterSecurityGroup
IpProtocol: '-1'
SourceSecurityGroupId: !Ref ClusterSecurityGroup
ClusterAllowAllOutboundToSelf:
Type: AWS::EC2::SecurityGroupEgress
Properties:
GroupId: !Ref ClusterSecurityGroup
IpProtocol: '-1'
DestinationSecurityGroupId: !Ref ClusterSecurityGroup
# This allows all outbound comms, which enables HTTPS calls and connections to networked storage
ClusterAllowAllOutboundToWorld:
Type: AWS::EC2::SecurityGroupEgress
Properties:
GroupId: !Ref ClusterSecurityGroup
IpProtocol: '-1'
CidrIp: 0.0.0.0/0
# Attach this to login nodes to enable inbound SSH access.
InboundSshSecurityGroup:
Condition: CreateSshSecGroup
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allows inbound SSH access
GroupName: !Sub 'inbound-ssh-${AWS::StackName}'
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: !Ref ClientIpCidr
Outputs:
ClusterSecurityGroupId:
Description: Supports communication between PCS controller, compute nodes, and login nodes
Value: !Ref ClusterSecurityGroup
InboundSshSecurityGroupId:
Description: Enables SSH access to login nodes
Value: !Ref InboundSshSecurityGroup
EFS
EFS からはコンソールでの作業であったため、温かみのある手作りでファイルシステムを作成しました。
File system ID をメモしておきます。(Lustre は今回は作成しませんでした。)
AWS::PCS::Cluster
PCS クラスターの作成を行います。ここから醍醐味ですね。
作成したセキュリティグループ ID、サブネット ID を指定します。
SlurmConfiguration は設定せずスキップします。 Tags ですが型が String となっており、List で埋め込むとエラーになるためスキップしました。
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template
Metadata:
'AWS::CloudFormation::Interface':
ParameterGroups:
- Label:
default: 'Network'
Parameters:
- SecurityGroupIds
- SubnetIds
Parameters:
SecurityGroupIds:
Type: List<AWS::EC2::SecurityGroup::Id>
SubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Resources:
Cluster:
Type: AWS::PCS::Cluster
Properties:
Name: !Sub 'cluster-${AWS::StackName}'
Networking:
SecurityGroupIds: !Ref SecurityGroupIds
SubnetIds: !Ref SubnetIds
Scheduler:
Type: 'SLURM'
Version: '24.05'
Size: 'SMALL'
# Tags:
# - Key: Name
# Value: !Sub '${Prefix}-cluster'
インスタンスプロファイルの作成
ノードグループの設定する IAM ロール、インスタンスプロファイルを作成します。
こちらは自分で作る必要があったため作成しました。
AmazonSSMManagedInstanceCore とログイングループへの参加権限を付与してあげます。
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template
Resources:
#################################################
# EC2 Launch Template Configuration
#################################################
Role:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub 'cluster-role-${AWS::StackName}'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
Path: /
Policies:
- PolicyName: !Sub 'cluster-policy-${AWS::StackName}'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 'pcs:RegisterComputeNodeGroupInstance'
Resource: '*'
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref Role
起動テンプレート
ノードグループは起動テンプレートをベースに動かすため起動テンプレートを作成します。
こちらは、テンプレートが用意されてあったためそちらを流用します。
ただし、事前にキーペアを作成する必要があったため、コンソールから作ります。
ユーザーデータで FES, Lustre のマウント処理が入っていますね。
AWSTemplateFormatVersion: 2010-09-09
Description: Launch templates for AWS PCS login and compute node groups, supporting shared EFS and FSx for Lustre file systems
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: Security
Parameters:
- VpcDefaultSecurityGroupId
- ClusterSecurityGroupId
- SshSecurityGroupId
- SshKeyName
- Label:
default: File systems
Parameters:
- EfsFilesystemId
- FSxLustreFilesystemId
- FSxLustreFilesystemMountName
Parameters:
VpcDefaultSecurityGroupId:
Type: AWS::EC2::SecurityGroup::Id
Description: Cluster VPC 'default' security group. Make sure you choose the one from your cluster VPC!
ClusterSecurityGroupId:
Type: AWS::EC2::SecurityGroup::Id
Description: Security group for PCS cluster controller and nodes.
SshSecurityGroupId:
Type: AWS::EC2::SecurityGroup::Id
Description: Security group for SSH into login nodes
SshKeyName:
Type: AWS::EC2::KeyPair::KeyName
Description: SSH key name for access to login nodes
EfsFilesystemId:
Type: String
Description: Amazon EFS file system Id
FSxLustreFilesystemId:
Type: String
Description: Amazon FSx for Lustre file system Id
FSxLustreFilesystemMountName:
Type: String
Description: Amazon FSx for Lustre mount name
Resources:
LoginLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: !Sub 'login-${AWS::StackName}'
LaunchTemplateData:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: HPCRecipes
Value: 'true'
MetadataOptions:
HttpEndpoint: enabled
HttpPutResponseHopLimit: 4
HttpTokens: required
KeyName: !Ref SshKeyName
SecurityGroupIds:
- !Ref ClusterSecurityGroupId
- !Ref SshSecurityGroupId
- !Ref VpcDefaultSecurityGroupId
UserData:
Fn::Base64: !Sub |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
packages:
- amazon-efs-utils
runcmd:
# Mount EFS filesystem as /home
- mkdir -p /tmp/home
- rsync -aA /home/ /tmp/home
- echo "${EfsFilesystemId}:/ /home efs tls,_netdev" >> /etc/fstab
- mount -a -t efs defaults
- if [ "enabled" == "$(sestatus | awk '/^SELinux status:/{print $3}')" ]; then setsebool -P use_nfs_home_dirs 1; fi
- rsync -aA --ignore-existing /tmp/home/ /home
- rm -rf /tmp/home/
# If provided, mount FSxL filesystem as /shared
- if [ ! -z "${FSxLustreFilesystemId}" ]; then amazon-linux-extras install -y lustre=latest; mkdir -p /shared; chmod a+rwx /shared; mount -t lustre ${FSxLustreFilesystemId}.fsx.${AWS::Region}.amazonaws.com@tcp:/${FSxLustreFilesystemMountName} /shared; chmod 777 /shared; fi
--==MYBOUNDARY==
ComputeLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: !Sub 'compute-${AWS::StackName}'
LaunchTemplateData:
TagSpecifications:
- ResourceType: instance
Tags:
- Key: HPCRecipes
Value: 'true'
MetadataOptions:
HttpEndpoint: enabled
HttpPutResponseHopLimit: 4
HttpTokens: required
SecurityGroupIds:
- !Ref ClusterSecurityGroupId
- !Ref VpcDefaultSecurityGroupId
KeyName: !Ref SshKeyName
UserData:
Fn::Base64: !Sub |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
packages:
- amazon-efs-utils
runcmd:
# Mount EFS filesystem as /home
- mkdir -p /tmp/home
- rsync -aA /home/ /tmp/home
- echo "${EfsFilesystemId}:/ /home efs tls,_netdev" >> /etc/fstab
- mount -a -t efs defaults
- if [ "enabled" == "$(sestatus | awk '/^SELinux status:/{print $3}')" ]; then setsebool -P use_nfs_home_dirs 1; fi
- rsync -aA --ignore-existing /tmp/home/ /home
- rm -rf /tmp/home/
# If provided, mount FSxL filesystem as /shared
- if [ ! -z "${FSxLustreFilesystemId}" ]; then amazon-linux-extras install -y lustre=latest; mkdir -p /shared; chmod a+rwx /shared; mount -t lustre ${FSxLustreFilesystemId}.fsx.${AWS::Region}.amazonaws.com@tcp:/${FSxLustreFilesystemMountName} /shared; fi
--==MYBOUNDARY==
Outputs:
LoginLaunchTemplateId:
Description: 'Login nodes template ID'
Value: !Ref LoginLaunchTemplate
LoginLaunchTemplateName:
Description: 'Login nodes template name'
Value: !Sub 'login-${AWS::StackName}'
ComputeLaunchTemplateId:
Description: 'Compute nodes template ID'
Value: !Ref ComputeLaunchTemplate
ComputeLaunchTemplateName:
Description: 'Compute nodes template name'
Value: !Sub 'compute-${AWS::StackName}'
ノードグループ
ログイン用のノードグループとコンピュート用のノードグループを作成します。
新しくできた AWS::PCS::ComputeNodeGroup
を活用していきましょう。
AMI ID はサンプル AMI を利用しました。
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template
Metadata:
'AWS::CloudFormation::Interface':
ParameterGroups:
- Label:
default: 'Network'
Parameters:
- SubnetIds
- Label:
default: 'PCS Cluster'
Parameters:
- ClusterId
- Label:
default: 'Login node group'
Parameters:
- LoginNodeInstanceProfileArn
- LoginNodeLaunchTemplateId
- LoginNodeSubnetIds
- Label:
default: 'Compute node group'
Parameters:
- ComputeNodeInstanceProfileArn
- ComputeNodeLaunchTemplateId
- ComputeNodeSubnetIds
Parameters:
ClusterId:
Type: String
LoginNodeInstanceProfileArn:
Type: String
LoginNodeLaunchTemplateId:
Type: String
LoginNodeSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
ComputeNodeInstanceProfileArn:
Type: String
ComputeNodeLaunchTemplateId:
Type: String
ComputeNodeSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Resources:
LoginNodeGroup:
Type: AWS::PCS::ComputeNodeGroup
Properties:
Name: !Sub 'login-node-group'
# aws-pcs-sample_ami-amzn2-x86_64-slurm-24.05-2024-12-14T05-28-32.441Z at ap-northeast-1
AmiId: 'ami-0e18e980afc64cc20'
ClusterId: !Ref ClusterId
CustomLaunchTemplate:
Id: !Ref LoginNodeLaunchTemplateId
Version: 1
IamInstanceProfileArn: !Ref LoginNodeInstanceProfileArn
InstanceConfigs:
- InstanceType: 'c6i.xlarge'
PurchaseOption: 'ONDEMAND'
ScalingConfiguration:
MaxInstanceCount: 1
MinInstanceCount: 1
SubnetIds: !Ref LoginNodeSubnetIds
ComputeNodeGroup:
Type: AWS::PCS::ComputeNodeGroup
Properties:
Name: !Sub 'compute-node-group'
# aws-pcs-sample_ami-amzn2-x86_64-slurm-24.05-2024-12-14T05-28-32.441Z at ap-northeast-1
AmiId: 'ami-0e18e980afc64cc20'
ClusterId: !Ref ClusterId
CustomLaunchTemplate:
Id: !Ref ComputeNodeLaunchTemplateId
Version: 1
IamInstanceProfileArn: !Ref ComputeNodeInstanceProfileArn
InstanceConfigs:
- InstanceType: 'c6i.xlarge'
PurchaseOption: String
ScalingConfiguration:
MaxInstanceCount: 4
MinInstanceCount: 0
SubnetIds: !Ref ComputeNodeSubnetIds
キュー
最後にキューの作成です。こちらも AWS::PCS::Queue
を利用します。
AWSTemplateFormatVersion: 2010-09-09
Description: Sample template
Metadata:
'AWS::CloudFormation::Interface':
ParameterGroups:
- Label:
default: 'PCS Cluster'
Parameters:
- ClusterId
- Label:
default: 'Compute node group'
Parameters:
- ComputeNodeGroupId
Parameters:
ClusterId:
Type: String
ComputeNodeGroupId:
Type: String
Resources:
Queue:
Type: AWS::PCS::Queue
Properties:
Name: !Sub 'queue-${AWS::StackName}'
ClusterId: !Ref ClusterId
ComputeNodeGroupConfigurations:
- ComputeNodeGroupId: !Ref ComputeNodeGroupId
うまく最後まで、できあがってそうです。
まとめ
以上、「CloudFormation が AWS Parallel Computing Service をサポートしました。」でした。
スタックをステップバイステップで作成しましたが、繋げられる部分が多いのではないかと感じました。
検証が捗りそうなアップデートで良きですね。AWS 事業本部コンサルティング部のたかくに(@takakuni_)でした!